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Abstract 

In this paper we revisit the sparse multiple measurement vector (MMV) problem, where the aim is to recover 
a set of jointly sparse multichannel vectors from incomplete measurements. This problem has received increasing 
, interest as an extension of single channel sparse recovery, which lies at the heart of the emerging field of compressed 

sensing. However, MMV approximation has origins in the field of array signal processing as we discuss in this paper 
Inspired by these links, we introduce a new family of MMV algorithms based on the well-know MUSIC method 
' in array processing. We particularly highlight the role of the rank of the unknown signal matrix X in determining 

\^ • the difficulty of the recovery problem. We begin by deriving necessary and sufficient conditions for the uniqueness 

of the sparse MMV solution, which indicates that the larger the rank of X the less sparse X needs to be to ensure 
uniqueness. We also show that as the rank of X increases, the computational effort required to solve the MMV 
problem through a combinatorial search is reduced. 
C/3 , In the second part of the paper we consider practical suboptimal algorithms for MMV recovery. We examine the 

^ ' rank awareness of popular methods such as Simultaneous Orthogonal Matching Pursuit (SOMP) and mixed norm 

minimization techniques and show them to be rank blind in terms of worst case analysis. We then consider a family 
^ ' of greedy algorithms that are rank aware. The simplest such method is a discrete version of the MUSIC algorithm 

' popular in array signal processing. This approach is guaranteed to recover the sparse vectors in the full rank MMV 

l/^ ■ setting under mild conditions. We then extend this idea to develop a rank aware pursuit algorithm that naturally 

. reduces to Order Recursive Matching Pursuit (ORMP) in the single measurement case. This approach also provides 

, guaranteed recovery in the full rank setting. Numerical simulations demonstrate that the rank aware techniques are 

, significantly better than existing methods in dealing with multiple measurements. 
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I. Introduction 

Sparse signal representations provide a general signal model that represents or approximates a signal using a 
linear combination of a small number of elementary waveforms (called atoms) selected from a large collection (the 
dictionary). Such models make it possible to solve many ill-posed problems such as source separation, denoising 

M. E. Davies is with the Institute for Digital Communication, Edinburgh University, Edinburgh EH9 3JL, U.K. (e-mail: mike.davies@ed.ac.uk). 
MED acknowledges support of his position from the Scottish Funding Council and their support of the Joint Research Institute in Signal and 
Image Processing with the Heriot-Watt University as a component of the Edinburgh Research Partnership. This work was supported in part by the 
SMALL project with financial support of the Future and Emerging Technologies (FET) programme within the Seventh Framework Programme 
for Research of the European Commission, under FET-Open grant number: 225913. 

Y. Eldar is with the Technion— Israel Institute of Technology, Haifa 32000, Israel. Phone: +972-4-8293256, fax: +972-4-8295757, E-mail: 
yonina@ee.technion.ac.il. She is currently on leave at Stanford, USA. Her work was supported in part by the Israel Science Foundation under 
Grant no. 1081/07 and by the European Commission in the framework of the FP7 Network of Excellence in Wireless COMmunications 
NEWCOM++ (contract no. 216715). 



April 27, 2010 



DRAFT 



2 



and most recently compressed sensing [1], [2] by exploiting the additional sparsity constraint. The key point is 
that when the signal, x, is sufficiently sparse it can still be uniquely determined from an underdetermined set of 
measurements y = *x, where * e M"*^" and m <n. 

The problem of finding the sparsest x consistent with a given observation vector y is known to be NP-hard in 
general [3], [4] and therefore is presumed to not be solvable in polynomial time. Instead, various suboptimal 
strategies have been proposed and have been demonstrated, both empirically and theoretically, to have good 
reconstruction performance in a range of settings. Commonly used strategies are typically based on convex relaxation 
[5], non-convex local optimisation [6] or greedy search strategies [3], [4], [7], [8], [9]. 

Encouraged by the potential power of sparse representations, researchers have begun to consider a number 
of extensions to the basic sparse representation model. These include the multiple measurement vector (MMV) 
problem [10], [11], [12], [13], as well as other union of subspace models [14], [15] such as block sparsity or 
tree structured sparsity [15], [16], [17] and bUnd compressed sensing [18]. These ideas have also been recently 
expanded to include sub-Nyquist sampUng of structured analog signals [19], [20], [21], [22], [23]. As with the 
single measurement vector problem (SMV) several suboptimal methods for finding a sparse matrix have been 
proposed, that have polynomial complexity [24], [25], [26], [10], [11], [12], [16], [15], [13]. These approaches are 
generally straightforward extensions of existing SMV solutions and can be roughly divided into greedy methods, 
and algorithms based on mixed norm optimization. We will discuss these two classes in Section V. One exception to 
this is the approach in [12] which reduces the MMV problem to a single channel recovery via a random projection 
that preserves the sparsity pattern. 

A variety of different equivalence results between finding the sparsest solution, the so-called ^o-problem, and the 
output of the proposed efficient algorithms have also been derived. In [11] an equivalence result was obtained for 
a mixed ip^i program in which the objective is to minimize the sum of the £p-nonas of the rows of the estimated 
matrix whose colunms are the unknown vectors. The condition is based on mutual coherence, and turns out to be 
the same as that obtained from a single measurement problem, so that the joint sparsity pattern does not lead to 
improved recovery capabilities as judged by this condition. Recovery results for the more general problem of block- 
sparsity were developed in [15] based on the RIP, and in [16] based on mutual coherence. Reducing these results 
to the MMV setting leads again to conditions that are the same as in the single measurement case. An exception 
is the work in [27], [28], [13] which considers average case performance assuming that X is generated at random 
from an appropriate distribution. Under a mild condition on the sparsity and on the matrix the probabiUty of 
reconstruction failure decays exponentially with the number of channels /. However, to date, all worst case recovery 
results have not shown any advantage to the MMV setting over the SMV case. The reason is that in the worst 
case, the matrix X may be comprised of a single repeated vector x, in which case effectively the MMV and SMV 
problems becomes identical, hence the worst case results are not capable of improving the recovery guarantees. 

As noted above, one approach to demonstrate the advantage of the MMV formulation is to use an average case 
analysis where X is generated at random. Since repeated columns are not likely to occur, this strategy allows for 
improved recovery guarantees. In this paper, we concentrate on worst-case performance, as in the bulk of prior work 
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on MMV problems. We show that we can break the worst-case analysis bottleneck by exploiting the rank of the 
matrix X. Although in the case of a rank-one matrix we cannot do better than in SMV recovery, this is no longer 
true when X has higher rank. In particular, when the rank of X is equal to k, we highhght the fact that it can be 
recovered exactly from the measurements Y under a mild condition on * using polynomial time algorithms from 
only m = k + 1 measurements per signal [29] based on the MUSIC algorithm popular in array signal processing 
[30] . Clearly this is a big advantage over the SMV problem which cannot guarantee perfect recovery for general 
* and all x with such few measurements. Even using combinatorial algorithms, recovery of all x is possible only 
if the number of measurements is at least 2k. 

Interestingly, the Unks between sparse representations and array signal processing were brought forward early 
on in the sparse reconstruction hterature [6] and the MMV problem in particular has origins in the field of array 
signal processing. However, the algorithmic connections and particularly the role of the rank of X, appear to have 
been missed. 

The main contribution of this paper is to demonstrate how the rank of X can be exploited in order to improve 
MMV recovery results in the worst-case setting. We begin by examining the conditions for which the equation 
Y = *X has a single sparse solution and, in Section III, we derive a necessary and sufficient condition for 
uniqueness and show that it depends directly on the rank of X: the larger the rank the less sparse X needs to be to 
still ensure uniqueness. Similarly, in Section IV, we show that the computational effort required to find the unique 
sparse solution through a combinatorial search is also dependent on the rank of X, and is reduced when the rank 
increases. In Sections V and VI we turn to discuss polynomial time recovery algorithms. We begin by showing that 
the common MMV methods are not rank aware, namely, they do not efficiently exploit the rank of X to improve 
recovery results. In particular, in the full rank case in which the rank of X is equal to k (which is the largest it 
can be) we show that almost all previous greedy and optimization-based MMV algorithms do not provide recovery 
for the worst-case choice of X from A; -|- 1 measurements. Moreover, independent of the rank of X, one can find 
examples that perform almost as badly as worst case SMV problems. We proceed to propose some rank aware 
algorithms inspired by the popular MUSIC technique, whose behavior improves with increasing rank of X, and are 
proven to provide exact recovery for all choices of X when the rank is equal to k, and the number of measurements 
is fc -h 1 (under mild conditions on Finally, in Section VII we present several simulations demonstrating the 
behavior of these different methods. 



Throughout this paper a coefficient vector x is said to be fc-sparse if the size of the support of x is no larger 
than k: \ supp(x)| < k. We define the support of a collection of vectors X = [xi, . . . , x;] as the union over all the 
individual supports: 



II. Notation and Problem Formulation 



A. Notation 




(1) 
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A matrix X is called k joint sparse if | supp(X) | < fc. In other words, there are at most k rows in X that contain 
nonzero elements. The set of k joint sparse matrices of size n x Z is defined as 

Jk := {X e M"><' : I supp(X)| < k}. (2) 

We make use of the subscript notation xq to denote a vector that is equal to some x on the index set f2 and 
zero everywhere else. Denoting by |n| the cardinality of fi, the vector xq is |i7|-sparse. For index sets that are 
sequences of indices, e.g. 1, . . . ,k we use the Matlab style notation 1 : k. 

We say that the support of x lies within fl whenever xq = x. For matrices the subscript notation wiU 
denote a submatrix composed of the columns of $ that are indexed in the set while the notation X^ : denotes 
a row-wise submatrix composed of the rows of X indexed by il. We denote the ith column of a matrix, by <p^, 
and use JV{^) for the nuU space and Tl{^) for the range of the matrix 

Throughout the paper, we also require a variety of different norms. We denote by ||x||p, p > 1 the usual £p norms, 
and by ||x||o the £° quasi-norm that counts the number of non-zero elements of x so that ||x||o = | supp(x)|. For 
matrices we define the ip^q norms as: 




where, with slight abuse of notation, we also consider the quasi-norms with p = such that ||X||o,g = | supp(X)| 
for any q. 

B. MMV sparse recovery problem 

We are interested in solving the sparse recovery problem associated with Multiple Measurement Vectors (MMV), 
in which the goal is to recover a jointly sparse matrix X of size nxl from m <n measurements per channel. Here 
I denotes the number of channels, or signals. This is a generalization of the standard Single Measurement Vector 
(SMV) problem that has been examined in detail, e.g. [31], [7]. Formally, the problem is defined as follows: 

Deiuiition 1 (MMV sparse recovery problem). Given Y e M"*^' and * e ^mxn ^^-^/i m <n find: 

X = argmin | supp(X)| s.t. *X = Y. (4) 

X 

The SMV problem can be recovered as a special case with 1 = 1. 

Throughout the paper we wiU assume that the dictionary * has unit norm colunms, \\<Pi\\2 = 1. we will also 
focus on the ideal model in which the measurements Y are noiseless, and X is strictly sparse. We do not treat the 
case in which the signals depart from the exact sparse model or any possible measurement errors, although such 
issues are clearly very important in practice. 

III. MMV Uniqueness 

While the MMV problem reduces to the SMV one when each observation vector Yj_: is cohnear, in general, the 
additional measurement vectors should provide further information to the problem and make it easier to determine 
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the support set. The question is: by how much? In this section we will focus on uniqueness conditions and see 
how the rank of X, or alternatively that of Y, can be used to improve the uniqueness conditions. In later sections 
we will use the rank to develop more efficient recovery algorithms with improved worst-case guarantees over the 
SMV setting. 

Sufficient conditions for the uniqueness of the MMV sparse recovery problem were given in [11]. Here we 
complete this result by showing that such a condition is also necessary. 

For the SMV problem it is well known that a necessary and sufficient condition for the measurements y = *x 
to uniquely determine each fc- sparse vector x is given by 

^^spa|(5) 

where the spark of * is defined as the smallest number of columns of * that are hnearly dependent. Since 
spark(*) < m + 1, we have inraiediately that m > 2k, namely, at least 2k measurements are needed to ensure 
uniqueness in the SMV case for aU possible choices of x. This also defines a necessary and sufficient condition 
for the general MMV sparse recovery problem, since one instance of this problem is any SMV problem repUcated 
I times [11]. 

Chen and Huo [11] showed that when rank(Y) > 1 the sufficient condition for uniqueness in the MMV sparse 
recovery problem can be relaxed by exploiting the rank of Y, as incorporated in the following theorem. 

Theorem 1 (Chen and Huo [11]). A sufficient condition from the measurements Y = *X, |supp(X)| = k, to 
uniquely determine the jointly sparse matrix X e jTit is 

, spark(*) - 1 + rank(Y) 

I supp(X) I < ^ — ■ (6) 

This condition was further shown in [12] to hold even in the case where the are infinitely many vectors y,. A 
direct consequence of this result is that matrices X which result in matrices Y with larger rank, can be recovered 
from fewer measurements. Alternatively, matrices X with larger support can be recovered from the same number 
of measurements. Since rank(X) < k, it is obvious that rank(Y) < k. When rank(Y) = k and spark(*) takes 
on its largest value of m + 1, the condition (6) becomes m > fc + 1. Therefore, in this best-case scenario, only 
fc -|- 1 measurements per signal are needed to ensure uniqueness. This is much lower than the value of 2k obtained 
in the SMV setting. 

Chen and Huo note that it would also be interesting to bound k in terms of rank(X) instead of rank(Y). 
Naturally we have that rank(Y) < rank(X). As we show in the next lenmia, we can actually replace rank(Y) by 
rank(X) in the condition (6): 

Lemma 1. The sufficient condition of (6) is equivalent to 

, spark($) - 1 + rank(X) 

I supp(X) I < ^ . (7) 

Proof: Since rank(Y) < rank(X), (6) automatically implies (7). For the reverse direction, suppose that (7) 
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holds. Since rank(X) < | supp(X)|, we have from (7) that 

I supp(X)| < spark(#) — 1. 



(8) 



So *a , Q = supp(X), must be full rank. This implies that rank(X) = rank(Y) and consequently (7) implies 
(6). ■ 
We now show that both (7) and (6) are necessary and sufficient for uniqueness in the MMV problem. 

Theorem 2. Condition (7), or equivalently (6), is a necessary and sufficient condition for the measurements Y = 
*X to uniquely determine the jointly sparse matrix X e Jk- 

An immediate consequence of the theorem is that in order to have a unique X with support set of size k when 
X has full rank (i.e. rank(X) = k), it is enough to take m ^ k + 1 measurements, as long as spark($) > k + 2, 
namely, that every set of fc + 1 columns of $ are linearly independent. 

Interestingly the m > k + 1 bound also occurs in the SMV case, where for almost all dictionaries $ almost all 
k sparse vectors x are uniquely determined by y = ^>x if m > k + 1 (for apropriately defined measures) - see [14, 
Theorem 2.6 1. The key difference in the MMV scenario is that the condition on X that guarantees recovery using 
only k+1 measurements is readily testable from the observed data, namely that rank(Y) = k. 
Proof: Sufficiency follows immediately from Lemma 1 and Theorem 1. 

To show necessity we will show that 2k > spark(4>) — 1 + r implies that there exists an X S M"^' with 
rank(X) = r that is not uniquely determined by Y = $X. 

Suppose that 2k > spark(4>) — 1 + r. Then there exists a support set T , \T\ — 2k ~ t + 1 and a vector v ^ 
such that ^>tv = 0. Let V = [v, . . . , v] be the matrix consisting of I replications of v. We can now construct an 
X with supp(X) c T, as follows: 



V{l:k-T+1,:} 



•■T-1 







{fc-T + lXi} 



(9) 



where Ir-i represents the identity matrix of size r — 1. By construction X is fc-joint sparse and rank(X) = r. 
However we can also define the matrix X, with supp(X) c T, by: 

Xt,: = Xt,: - V. (10) 

By construction, X is also fc-joint sparse, and *X = *X. Therefore, it follows that (7) is also a necessary condition 
for uniqueness. ■ 
Theorem 2 shows that rank(X) plays an important role in uniqueness of MMV problems, since the rank defines 
the dimension of the subspace of sparse coefficients that generate observation vectors. In the ensuing sections we 
wiU further show that rank(X) also plays an important role in the performance of joint sparse recovery algorithms. 
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IV. MMV Recovery: Exhaustive Search 

Our goal now is to show how the rank of X can be used to improve MMV recovery. We begin by considering 
the (generally impractical) exhaustive search approach for X as this most easily allows us to incorporate the rank 
information. We then discuss the fuU rank case for which it is straightforward to develop rank-aware algorithms, 
namely, algorithms that exploit the rank of the measurements, and are guaranteed to recover the true value X from 
only k + 1 measurements. In Section V we show that standard MMV methods are not rank aware. Following which, 
in Section VI, we suggest two rank-aware algorithms that can be applied in the non-full rank scenario as well. 

A. Combinatorial Search 

The sparse recovery problem can naturally be cast in terms of a combinatorial optimization. That is we seek to 
minimize the number of nonzero rows of X subject to the constraint *X = Y. To count the nonzero rows we can 
consider the io norm of the vector consisting of arbitrary norms of the rows of X, leading to the problem: 

X = argmin||X||o,, s.t. *X = Y (11) 

X 

{q arbitrary). If we know k (or a bound on it, say from the uniqueness conditions) then we can alternatively minimize 

X = argmin min ||*X-Y||2 2 (12) 

X n,\n\<k 

Other norms are also possible. The choice of the Frobenius norm is popular in array signal processing [32], as 
it can be related to a maximum likelihood cost function in the presence of Gaussian observation noise. Schemes 
such as Alternating Projection [33] can be used to search for a local minimum of this cost function and have been 
applied to great effect [29], [34]. Alternatively a global minimum for (12) can be found from an exhaustive search 
through all ( ^ ) support sets f2 with \Q\ = k, until we find the *n that minimizes (12). 

B. Full-Rank MMV: MUSIC 

Surprisingly, in the case in which Y has rank equal to k, an exact solution can be found using a very simple 
search algorithm. This method incorporates the rank information of Y to efficiently determine X. We refer to 
this scenario as the full rank case, since, we always have that rank(Y) < rank(X) < k. The last inequality is a 
result of the fact that X is fc-sparse. The same approach can be extended to reduce the complexity of the general 
combinatorial search for arbitrary ranks as well. 

To obtain a rank-aware algorithm for the case when rank(Y) = k, we adapt a discrete version of the MUSIC 
algorithm [30], popular in array signal processing. To the best of our knowledge, the first application of this discrete 
form of MUSIC to an MMV problem was proposed and analyzed by Feng and Bresler [29], [34] in the context of 
multiband sampling. This work substantially predates much of the renewed interest in sparse recovery problems. The 
connection with array processing techniques and rank of the measurements was later exploited in [22] to develop 
recovery methods for time delay estimation from low rate samples. 
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Since by assumption, rank(Y) = k, it follows immediately that 1Z{Y) = TZ{^q). Therefore, we have fuU 
visibihty of TZ{^n) in the given measurements which intuitively can be used to ensure recovery of X. The question 
is how to do this in practice. Here we rely on the MUSIC principle, which exploits the signal subspace properties. 
As we noted above, every correct column 0^ of lies within the range of Y. Assuming that (7) is satisfied, 
from uniqueness, it follows that these are the only columns contained in TZ{Y). Namely, any (p- not in the support 
of *n, will not lie entirely in TZ{Y). Consequently, if we project these vectors onto the orthogonal complement 
of 7^(Y), then the projection will contain energy. We can use this observation to develop a concrete method for 
finding the values of 0j that are in SI. 

Specifically, we first calculate an orthonormal basis U = orth(Y) for '1Z{Y) (for example, by using a singular 
value decomposition (SVD) of Y). Since U is an orthonormal basis of the range of *n each of its columns must 
he within this subspace. Furthermore, by our assumption that (7) is satisfied, no other column of * can lie within 
this subspace. Therefore, mathematically, we have that 



If we therefore select the k atoms that minimize — UU-^)0j||2, then we are guaranteed to find the correct 
support set. Since, by assumption, the columns of $q linearly independent, we can determine X as X = Y. 
As we have already seen, (7) holds as long as m > fc + 1, and all sets of m columns of $ are linearly independent. 

Readers familiar with the array processing literature will recognize the essential ingredients of the MUSIC 
algorithm. In array processing it is more common to compute the inverse of (13) which is often called the MUSIC 
pseudo-spectrum. 

We have seen that in the full rank case the MUSIC algorithm can be used to fully recover X from as little as 
k + 1 measurements per signal. The surprising result is that this is true for any choice of X, namely, this is a 
worst case recovery guarantee. Recovery is obtained using a very simple algorithm that is computationally efficient: 
it requires a single SVD, and computing the norm of several vectors. When rank(Y) < k, we can use similar 
ideas to develop a rank-aware algorithm that will improve as the rank of X increases. We discuss this approach in 
Section VI-A which leads to a rank aware thresholding method. This technique reduces to the MUSIC algorithm 
in the full rank scenario. 

C. Reduced-Rank MMV 

When T = rank(Y) < fc we can still exploit the geometry through a reduced combinatorial search. If r < fc 
then 7^(Y) c TZ{^q) and there must exist at least one subset 7 C such that I7I = fc — r and 0j ^ (Note 
that typically all k — t sized support sets 7 C will satisfy this). Assuming that *a has full rank, we then have 
that 7^([*^, Y]) = TZ{^n)- Since I7I = A; — r we only need to search over subsets of size k — t. Of course there 
are 'only' ( ^-t ) such support sets, so this search still has exponential complexity unless kK, t. 

Specifically let (5(7) be an orthonormal basis for 7?.([*^, Y]). Then an optimal 7 can be found by solving: 



11(1 - UU^)</.J|2 = 0, if and only if i e fi. 



(13) 



7= argmin - Q(7)g(7)^)||o,, 



(14) 



l,\l\=k-T 
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for an arbitrary q. The correct support set can then be recovered by considering the full rank problem associated 
with the augmented measurement matrix Y]. 

Generally the optimal solution to (14) wiU not be unique and there will typically be multiple solutions associated 
with the ( ^'^^ ) subsets of the true support. Interestingly, this presence of multiple equivalent minima here suggests 
that such a problem might be difficult to convexify. 

We also note that for special structures of the matrix the support of X can be recovered exactly even when 
rank(Y) is smaller than k. This is the case, for example, when * consists of exponentials. In this case, the MUSIC 
approach can be replaced by the ESPRIT method, which stands for estimation of signal parameters via rotational 
invariance techniques [35]. However since our emphasis here is on general matrices we do not elaborate more 
on this point. 

Motivated by the fact that a higher dimensional range space of Y in MMV problems plays an important role 
in the identifiabiUty of X and can also help reduce the complexity of the combinatorial search we next examine 
which practical algorithms are able to make use of this rank information and which do not. 

V. Rank-Blind MMV Algorithms 

Despite the fact that the rank of X (and therefore that of Y) plays an important role in the MMV problem, we 
next show that some of the most popular algorithms are effectively rank blind. Namely, they do not allow for perfect 
recovery in the fuU rank case, and furthermore the worst case behaviour of such algorithms approaches that of the 
SMV problem. In practice, as we wiU see in Section VII, such algorithms often exhibit improved performance 
when there are multiple measurements, however, they suffer significantly from not properly exploiting the rank 
information. 

A. Greedy Methods 

Several greedy algorithms have been proposed to treat the MMV problem. Two examples are extensions of 
thresholding and OMP to the multiple measurement case [25], [36], [27], [13], [16]. For 1 < g < oo they produce 
a fc-sparse signal X from measurements Y = *X using a greedy search. 

In g-thresholding, we select a set fl of k indices whose g-correlation with Y are among the k largest. Let 6k be 
the fcth largest g-correlation of any 0^ with Y. Then we have: 

n = {i : UfY\\g > 9k} (15) 
After the support is determined, the non-zero coefficients of X are computed via an orthogonal projection: 

The g-simultaneous OMP (SOMP) algorithm is an iterative procedure where in each iteration, an atom is selected, 
and a residual is updated. The next selected atom is the one which maximizes the g-correlation with the current 
residual. The pseudocode for SOMP is summarized in Algorithm 1. 
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Algorithm 1 Simultaneous Orthogonal Matching Pursuit (SOMP) 
1: initiaUzation: R(o) = Y, X(o) = 0, 0° = 
2: for n = 1; n := n + 1 untU stopping criterion do 

3: i" = argmaxJ|0fR("-i)||, 

4: r2" = r2"-iui" 

6: R(") = Y - *X(") 
7: end for 



Different authors have advocated the use of different values for q in step 3. However, it has been shown [11] 
that, independent of q, SOMP will recover a joint sparse representation with joint support O whenever the Exact 
Recovery Condition (ERC) [7] is met: 

max||*J,</.,|li <1. (16) 

For the SMV problem Tropp showed that the ERC is also necessary to guarantee recovery for all vectors with 
support ri [7, Theorem 3.10]. It turns out the that the necessary condition for SOMP also approaches the ERC, and 
furthermore, this is independent of the rank of X. This implies that the SOMP algorithm is not able to exploit the 
rank information in order to improve the recovery ability in the worst-case, as we show in the theorem below. 

Theorem 3 (SOMP is not rank aware). Let r be given such that 1 < t < k and suppose that 

max||*Uilli>l (17) 

for some support O, |f2| = fc. Then there exists an X with supp(X) = O and rank(X) = r that SOMP cannot 
recover 

Proof: Since the ERC does not hold, we know that for the SMV problem, there exists a vector x for which 
OMP will fail. Let x be a vector with supp(x) = such that OMP incorrectly selects atom j* ^ at the first 
step with 

> max|</)f*a;| +e (18) 

for some e > 0. Let X := [x, x, . . . , x] be the rank 1 matrix associated with x that cannot be recovered from 
Y = *X by SOMP using any q. 

We now perturb X, to give X = X + E, supp(E) = f2, max^ ||0j*E||g < V-/ie/2 such that X has rank r and 
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define Y = *X. We now have: 

||0j.Y||,> ||0j.*X||,-||0j:*E||, 



(19) 

= niax||0f*X||g + /i/V2 



> 



max{||0f*Xl|, + ||0f#E||J 



>nia^||0fY||,. 

Line 1 in (19) is the reverse triangle inequaUty, line 2 follows from the definition of the q norm and the fact that 
X is rank 1, and Une 3 follows from (18). Lines 4 — 6 are a result of reversing these arguments. 

Equation (19) therefore shows that no correct atom will be selected at the first step in the perturbed problem and 
X will not be correctly recovered. ■ 

We conclude from Theorem 3 that SOMP is effectively blind to the rank of X. An identical argument can also 
be used to show that an MMV version of any similar Matching Pursuit type algorithm (e.g. M-MP, M-ORMP [10]) 
will also be rank blind, including identification of the support set by thresholding ||0jY||g as proposed in [27]. 

B. Mixed (} ji'^ minimization 

Another popular joint sparse recovery algorithm is to perform mixed norm minimization: 

X = argmin ||X||i,g s.t. *X = Y (20) 

X 

for some (J > 1 (values of g = 1,2 and oo have been advocated). This is a simple extension of the (} minimization 
used to solve SMV problems. In SMV the necessary and sufficient condition for the recovery of vectors x with 
support is given by the A^m// Space Property (see [31], [37], [38]): 

ll^alli < ||-2a=||i. V^eAr(*). (21) 

Here 17"^ is the complement of the set O. 

As with SOMP we can leverage the SMV conditions for recovery to show that mixed norm methods are not 
rank aware: 

Theorem 4 (f^/f* minimization is not rank aware). Let r he given such that 1 < r < fc and suppose that there 
exists a z G Af{^) such that 

\\zq\\i > \\znc\\i (22) 
for some support Cl, \fl\ = k. Then there exists an X with supp(X) = fi, rank(X) = r that (20) cannot recover. 

Proof: The proof follows along the same lines as Theorem 3. Let X = [x, x, . . . ,x] be the rank 1 matrix 
such that (} minimization fails to recover x. Denote the ^ minimum solution by x and define X = [x, x, . . . , x]. 
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Suppose that 



|X||l_g > ||X||l^g + e 



(23) 



for some e > 0. We now perturb X by a matrix E, X = X + E, such that supp(E) = fi, rank(X) = r and 
||E||i,q < e/2. Therefore, the solution X of (20) for the perturbed problem must satisfy: 



||X||i,, < ||X||i,, + ||E||i,, < ||X||i,, + e/2 < ||X||i,, - e/2. 



(24) 



On the other hand, by the triangle inequality. 



||X||i,,>||X||i,,-||E||i,,>||X||i,,-e/2. 



(25) 



Therefore, ||X||i^q > ||X||i^g so (20) fails to recover the correct solution. 



The argument in the proof of the theorem can be apphed to any joint sparse recovery that uses 1 1 • | \p^q for p < 1, 
using the appropriate Null Space Property associated with the p-quasi-norm [37]. Thus the result also appUes to the 
M-FOCUSS family of algorithms [10] that define a specific means of (locally) minimizing the {p, g'}-quasi-norms. 

We conclude this section by noting that both greedy methods, and mixed norm approaches, are not rank aware: 
in the worst-case, their recovery conditions are the same as the SMV problem so that the rank cannot be exploited 
to improve worst-case performance. In the next section we will develop new classes of algorithms that are rank 
aware: in the full rank case they guarantee recovery from k + 1 measurements, and in the non full rank case, their 
performance degrades gracefully with the rank. 



In the last section we noted that two classes of popular techniques for joint sparse recovery were effectively rank 
blind. We now examine methods that can be shown to be rank aware. In particular, we consider two algorithms, 
both of which can be categorized as "greedy", and are discrete versions of techniques already in existence in the 
array processing conmiunity. The first is based on the MUSIC algorithm, and results in rank aware thresholding. 
The second is an extension of OMP, that is rank aware. As in the SMV problem, the thresholding approach is 
simpler to implement, however the OMP based approach leads to improved recovery. 

A. Rank Aware Thresholding 

The simplest SMV sparse recovery algorithm selects the ith atom, i G Ctif |0f y| is greater than some threshold, 
6 and then recovers an estimate of x using the pseudo-inverse, x = ^^y- While this idea was extended to the 
MMV problem in [27] by considering the thresholded g-norm, ||0fY||g, we noted above, that as with SOMP, 
such an approach is not rank aware. In contrast, we have seen a rank aware version of thresholding selection in 
Section IV-B for the full rank case, in the form of the discrete version of MUSIC. Although it was stated there 
for the case rank(Y) = k we can equally use the algorithm when rank(Y) < k, albeit with reduced performance. 
The idea is surprisingly simple: instead of considering ||0j^Y||g, we replace Y by an orthonormal basis for "/^(Y), 
resulting in U||2 where U is chosen to have orthonormal columns such that 7?.(U) = TZiY). The pseudocode 



VI. Rank Aware MMV Algorithms 
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Algorithm 2 Rank Aware Thresholding 
1: Calculate U = orth(Y) an orthonormal basis for Ti{Y); 

2: Calculate = {i : \\cj)JU\\2 > Ok} where the threshold, 9k, is set to select the k largest values; 
3: X = *J,Y. 



is summarized in Algorithm 2. When k is unknown a fixed threshold, 9, can be applied to ||(/>J^U||2 to estimate 
the support. 

Note that, when rank(Y) = 1, rank aware thresholding reduces to standard thresholding. However, in the higher 
rank scenarios, unlike the standard joint thresholding techniques [27], the £^-norm is the natural choice as this is 
the only norm that is invariant to the (arbitrary) choice of orthonormal basis, U used to represent Ti{Y). In the 
full rank case. Algorithm 2 is identical to MUSIC. 

The key full rank property of this algorithm that we discussed in section IV-B was first proved in [29] and is 
well known in array signal processing [39]. We summarize this result in our notation. 

Theorem 5 (Feng [29]). Let Y = *X with |supp(X)| = k, rank(X) = k and k < spark(*) - 1. Then Rank 
Aware Thresholding is guaranteed to recover X (i.e. X = XJ. 

A direct consequence of the theorem is that m ~ k + 1 measurements are sufficient to recover X, as long as $ 
has full spark, namely, that all sets of fc + 1 columns are linearly independent. 

1) Practical subspace estimation: As noted in [29], [34], due to the stability of invariant subspaces [40] the 
MUSIC based recovery is also robust to noise and other perturbations, though to the best of our knowledge the 
degree of robustness in the MMV context has not yet been fully quantified. 

This, of course requires Algorithm 2 to be modified to consider the practical issue of estimating the rank of the 
sparse component and its signal subspace in the presence of measurement and/or numerical error. In this case we 
have Y — $X + N where N is some measurement error. The estimation of the signal subspace can be performed 
through a truncation of the eigenvalue decomposition or SVD of YY^. It is usual to truncate the dimension of 
the subspace based upon the eigenvalues to distinguish between the signal and the noise subspaces. In classical 
MUSIC this truncation is linked to the estimation of the number of targets (equivalent to the sparsity in our model), 
however, when Algorithm 2 is used in the reduced rank setting it is necessary to divorce the estimated rank of the 
signal subspace from the sparsity level k. 

If N is considered as a deterministic error term such truncation can be done based upon a known bound for NN^. 
An alternative approach is to treat N as stochastic. This is the approach usually favoured in array processing where 
the problem has received considerable attention; see for example [33]. Numerous techniques have been proposed 
including sequential hypothesis tests, information based criteria and eigenvector detection tests. As the focus of this 
paper is not on the noisy scenario we do not consider these in more detail here and instead point the interested 
reader to [33] and the references therein. 
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B. Rank Aware Pursuits 

Thresholding techniques can generally be refined through the use of pursuit based methods. These involve 
iteratively selecting a single atom at a time, calculating an approximate solution for X and the residual R, and 
then selecting a new atom. This is repeated until completion. The exact form of the pursuit algorithm depends on 
the selection strategy and the refinement step. 

Here we introduce a selection strategy based upon the Rank Aware thresholding approach above. 

Rank Aware Selection Given a residual R("~^) at the (n — l)th iteration we define the following selection 
strategy at the nth iteration: 

Q{n) ^ uargmax||0fu("-i)||2, (26) 

i 

where u("-i) = orth(R("-i)). 

As with the rank aware thresholding, we restrict our attention to the 2-norm since this is invariant to the 
orthonormal basis, U. 

The main difference between rank aware selection and standard OMP type algorithms, is that the criterion is 
based on the inner products with U^"^, rather than R^"^. This is the same idea that we used in rank aware 
thresholding: instead of comparing with the given measurements Y or the resulting residuals R("\ we compare 
with an orthonormal basis for the span. This simple difference, which is easy to implement in practice, is enough to 
allow for rank awareness. Furthermore, as we will see in the simulations section, it results in enhanced performance. 

Below, we introduce two algorithms that use the rank aware thresholding principle. The first is a natural 
generalization of SOMP, where we simply use the rank aware selection step. Although we will see that this 
substitution improves the performance considerably, it does not result in full rank awareness. In the full rank case, 
perfect recovery from k + 1 measurements under the spark condition is not guaranteed (or typically observed as we 
will see in Section Vll). As we show, this is a result of rank degeneration of the residual: the rank of the residual 
generally decreases in each iteration. To rectify this, we propose a modified selection step which forces the sparsity 
of the residual to decrease along with its rank. This ensures that the residual has full rank at each iteration, and 
results in a fully rank aware OMP-type algorithm. In simulations, we will see that this approach tends to have the 
best performance for all choices of the rank of X. 

Similar to other MP techniques, successful selection of an atom from the correct support set z € requires 
the following condition: 

^^^^^W^l^W^ < 1. (27) 
maxjen 110, U||2 

We begin by noting that for successful selection, i.e. for (27) to hold, the ERC is a sufficient but not necessary 
condition. Specifically we have the following proposition. 

Proposition 1. Let Y = ^>X supp(X) — ^, \Q\ ^ k and rank(Y) = r. Then: 

1) 111 < 1 is a sufficient condition for Rank Aware selection to correctly choose an atom i G 
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2) IfT>k + l- \ supp(*Jj0j*)| /or all <j>j* := argmax^^. ||i, j ^ f2 then 

max|l*|.^J|i < 1 (28) 

is not a necessary condition for correct atom selection. 

3) When t = k then k < spark(*) — 1 is necessary and sufficient for rank aware selection to correctly select 
an atom with index i G f2. 

Proof: The proof of part 1 that ERC is sufficient is identical to that for SOMP, given in [11] and is based 
upon standard norm inequalities. 

Let (j>ji, := argmax^ j ^ O. We can bound the norm U||2 for any U as follows: 

\\<Pj*^h = ^^ — — 

ll*nUx|U ||x||2 (29) 



x#0 |lx||2 

= ||*t,</,^..||imax||0fU||2 

Therefore, ERC is sufficient for recovery. However, unhke SOMP, the norm equahties cannot necessarily be 
approached since U is constrained to be colunm orthonormal. 

For the equality in (29) we require x to simultaneously maximize two quantities. To maximize ||*qUx||oo/||x||2 
X must satisfy: 

X oc \J'^<Pi* (30) 

for some cf)^^, — argmax^ ||(/)J'U||2, i G fi. Secondly to maximize |($q0^„)-^$qUx|/||$qUx||oo we require: 

*SUx « sgn(*J,0^.O. (31) 

Let us define the vector v oc [*Q]^sgn($Q0j-,), ||v||2 = 1. Combining the two constraints gives: 

UU^</)i, oc V. (32) 

This implies that v lies within the range of U and is the closest vector within U to 0j« . However close examination of 
V shows that it is equally close to any cf)^ for which i e supp($Q0j-, ) C fl. Hence all atoms 0j, i G | supp(*Q(/>j* ) | 
are equally as close to U, and all must satisfy (32). 
Now define the subspace W as: 

W := {w : w = ^ai(0i,v)0.,i G supp(*f;0j.O, = 0}. (33) 

i i 

Note that by definition U^w = 0. However by inspection we also have dim(W) = | supp(^Q0j,)| — 1. Since 
dim(U) = T this is only possible if r < A; + 1 — | supp($Q0 i)!. 
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Thus when r > A; — 1 + 1 supp(*[j0j* ) | equahty in (29) cannot be achieved. Furthermore since the set of column 

orthonormal matrices U is compact a maximum must exist: 

maxj^Q ||0^U||2 t 

max = c < max ^o<P,- 1- (34) 

u maxien ||</>rU||2 " 

Therefore, (28) is not a necessary condition for correct atom selection, which proves part 2. 

Finally when r = A; the necessary and sufficient conditions in part 3 for correct atom selection follow from the 
rank aware thresholding result. Theorem 5, and the identifiabiUty conditions in Theorem 2. ■ 

Proposition 1 shows that in the worst case scenario we lose nothing by incorporating the orthogonalization within 
the selection step. Furthermore part 2 shows that in general the selection is more effective than that of SOMP, 
although we have not quantified by how much as this seems diffficult to estimate for general r. The exception is, of 
course, when r = A;. In this case, RA-selection inherits the desirable property from the discrete MUSIC algorithm 
that correct detection is guaranteed. 

1) (Partially) Rank Aware OMP: One possible approach to developing a Rank Aware Pursuit would be to 
substitute (26) as the selection step within SOMP (step 3). We call this algorithm RA-OMP. In Section VII we will 
see that the incorporation of the rank aware selection step substantially improves the average recovery performance 
over SOMP. However, curiously even when rank(Y) = k RA-OMP is not guaranteed to exactly recover X. That 
is, it does not achieve the performance of Rank Aware Thresholding and is not fully rank aware. 

This can be explained by the fact that the rank of the residual deteriorates at each step of RA-OMP, a process 
we call rank degeneration. When selecting the first atom we have R^"^ = Y and from Proposition 1, assuming that 
rank(Y) = k, we are guaranteed to select an atom such that i The updated residual is R^^^ = (7— )Y. 
Since 7?.(Y) = 7?.(*q), the rank will be reduced by one such that rank(R(^)) = k—l. With a httle manipulation 
we can write R^^^ as follows: 

R(l) = <t>i^0,:+4>i{ E ('/>J'/>i)X,,:)- (35) 

In general, however, the second term in (35) will not be zero and so R^^^ will still be fc-joint sparse. This means that 
at the second iteration the selection step will not have guaranteed recovery since rank(R(^)) = k — l. Furthermore, 
if the first r iterations have correctly selected atoms from f2. then rank(R('^)) = fc — r while R^'"' will still generally 
be k joint sparse. This imphes that the performance of the SOMP selections will tend to degenerate towards SMV 
performance as the number of iterations grows. 

2) Rank Aware Order Recursive Matching Pursuit: We can rectify the rank degeneration problem using a 
modified selection step or equivalently modifying the dictionary at each step. The idea is to force the sparsity of 
the residual to decrease along with its rank, so that Lemma 1 can still be apphed. The mechanism that we will 
use for this has already been used in the SMV problem and goes by various names including: Orthogonal Least 
Squares [41] (since it selects the atom that minimizes the residual in the least squares sense at each step), and Order 
Recursive Matching Pursuit (ORMP) [3] (note that historically OMP and ORMP have been repeatedly confused 
for each other. For a potted history of the subject see [42]). While an MMV extension of ORMP was presented in 
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Algorithm 3 Rank Aware Order Recursive Matching Pursuit (RA-ORMP) 

1: Initialize R(°) = Y, X(°) = 0, 11° = and 0^ = 0^ for all i. 

2: for n = 1; n := n + 1 until stopping criterion do 

3: Calculate orthonormal basis for residual: U^""^) = Orth(R("-i)) 

4: = argma^^f,(„_.) ||0f U("-i) ||2 

5: f2" = U i" 

6: X[,"j . = *t,„Y 

7: Calculate orthogonal projector: -Pq(„) := (I — *Q(n) ) 

8: R(")=P^^„Y 

10: Renormalize 0^ = 0-/||0-||2, for i ^ fif""!). 

11: end for 



[10], it is based on similar norm extensions to those used in SOMP and therefore is similarly rank bUnd. Here we 
present a Rank Aware Order Recursive Matching Pursuit (RA-ORMP), and show that it has guaranteed recovery 
in the fuU rank case, as well as empirically exhibiting improved performance for all values of rank of X. The 
pseudocode for RA-ORMP is given in Algorithm 3. 

In practice, we do not calculate the projections as detailed above and instead use a Gramm-Schmidt orthogo- 
nalization procedure as in the standard implementation of ORMP [42]. Furthermore, a practical implementation of 
RA-ORMP would also need to incorporate a estimate of the signal subspace in step 3 of Algorithm 3 along the 
same lines as that proposed in section VI-Al. 

Note the key difference between RA-OMP and RA-ORMP is that we not only project the observed data Y 
orthogonal to the selected atoms to calculate the residual, we also project the remaining dictionary atoms (step 9) 
and then, crucially renormalize them (step 10), so that all atoms are again unit norm. 

We next show that, like RA- thresholding, RA-ORMP will exactly recover X in the full rank scenario: 

Theorem 6 (RA-ORMP, full rank case). Let Y = *X with \ supp(X)| = k, rank(X) = k and k < spark(*) - 1. 
Then RA-ORMP is guaranteed to recover X (i.e. X = XJ. 

Proof: From the spark condition we know that cf)^ ^ 7?.(*o) for j ^ CI and thus X is identifiable. From 
Proposition 1 and the rank assumption, the selection at the first step is successful. It is therefore sufficient to show 
that the updated residual provides another full rank identifiable problem. 

Suppose that we select index i e fi at the first step. The new residual is then: 

R(i) = P^(„, Y. (36) 
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Therefore rank(R(^') = fc — 1. If we now expand Y in terms of the atoms (f>j we get: 

jen\i 

= h'^i,-- (38) 

jen\i 

where Xj,: is the jth row of X rescaled by ||Pj|(„)0j||^^. Thus R(^^ has a (A; — 1) sparse representation within 
the modified dictionary ^. 

Finally we need to show that X is also identifiable. Since 0^- ^ TZ{^n) for j ^ Cl, we know that ||-Pq 0^ II2 > 0. 
However this property is preserved under the projection of the dictionary since -Pq -P^"^ = Pq for aU i e O, so that 
X is also identifiable. Recursively applying the above arguments until R(^^ = completes the proof. ■ 

We see that RA-ORMP does not suffer the same rank degeneration that RA-OMP does and provides a natural 
rank aware algorithm that reduces to ORMP in the SMV case while achieving guaranteed recovery in the fuU rank 
MMV case when rank(Y) = k. 

ORMP is usually championed as superior to OMP, since at each step it selects the atom that most decreases the 
size of the residual. Furthermore, empirical evidence suggests that ORMP generally outperforms OMP (shghtly) 
but at the expense of additional computation. However, to our knowledge, there is no study of the necessary and 
sufficient recovery conditions for ORMP. It is easy to see that the ERC provides a necessary condition since in 
the first selection step OMP and ORMP are identical. Curiously though, it is not clear whether the ERC condition 
holding for * implies that it wiU also hold for the modified dictionary ^. 

C. Link with sequential MUSIC techniques 

Within the array processing literature a number of sequential variants of MUSIC have previously been proposed 
[43], [44], [45], which relate to our derivations here. In [43] a sequential MUSIC algorithm is introduced which 
can be thought of as a continuous parameter version of a Rank Aware Matching Pursuit, i.e. RA-OMP, but without 
the orthogonahzation step. In [45] an algorithm with the painful name of Recursively Applied and Projected (RAP) 
MUSIC was introduced. This can be thought of as a continuous parameter version of RA-ORMP. However, because 
in array processing the aim is to estimate a continuous parameter vector associated with the directions of arrival for 
multiple sources the type of analysis performed on these algorithms is very different to the exact recovery results 
presented here for the discrete sparsity model. 

VII. Numerical Experiments 

In this section we explore the empirical performance of RA-thresholding, RA-OMP and RA-ORMP. We contrast 
these with results for the rank bUnd recovery algorithm: SOMP (using p = 2). For comparisons with mixed 
norm minimization we point the reader to the comparisons performed in [13], where empirically SOMP generally 
exhibited superior performance. 
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To test the four algorithms we consider the family of random matrices $ whose elements were drawn indepen- 
dently from a normal distribution: ^ A/'(0, 1). The columns of $ were then normalized so that ||0j||2 = 1- 
The dimensions of $ were fixed to n = 256 and m = 32, while the number of measurement vectors, I, was varied 
between 1 and 32. Finally the non-zero entries of X were also draw independently from a unit variance normal 
distribution. Note this last condition immediately imphes that, with probability one, rank(Y) = /. 



SOMP 



RA-OMP 




10 20 
Sparsity k 
RA-ORMP 



10 20 
Sparsity k 



10 20 
Sparsity k 
RA-tliresliolding 




10 20 
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Fig. 1. The empirical probability of recoveiy for SOMP, RA-OMP, RA-ORMP and RA-thresholding as a function of sparsity level k. The 
curves in each plot relate to / = 1, 2, 4, 16 and 32 (from left to right). 



Plots of the empirical probability of recovery for the four algorithms are given in Figure 1 . It is clear from these 
plots that while RA-OMP appears to be superior to SOMP for moderate numbers of measurement vectors {1 — 2,4) 
both SOMP and RA-OMP appear to stall at around the same sparsity level and fail to recover vectors with k > 11. 

In contrast to this, both RA-ORMP and RA-thresholding demonstrate full recovery in the full rank case, including 
when Z = 32 up to a sparsity level of A; = 31 as predicted by the theory. However RA-thresholding does not appear 
to provide recovery much beyond the full rank condition. That is, recovery drops immediately once k > I. As with 
the SMV thresholding, better performance is achievable if we restrict the dynamic range of the nonzero coefficients. 

The performance of RA-ORMP clearly provides the best recovery. It uniquely appears to be able to achieve OMP 
type performance in the SMV case and consistently provides recovery well beyond the full rank case. For example 
when / = 16 correct recovery is maintained up to fc = 23. 

In Figure 2 we show explicitly the improvement in recovery performance as a function of / for the same setup 
as above with a fixed sparsity level of fc = to/2 = 16. Note that for I < 16 the rank of Y is equal to /, while 
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for I > 16 the rank remains constant at 16. For this level of sparsity none of the algorithms achieve significant 
recovery rates in the single measurement vector case. However, as the number of measurements and the rank of Y 
is increased all algorithms get improved recovery. The figure highlights that the rank aware algorithms are clearly 
able to exploit the rank information and the recovery rate grows to 100% when the data matrix Y achieves maximal 
rank. What is particularly striking though is how quickly the rank information improves the recovery rate in the 
RA-ORMP algorithm even when the rank of Y is significantly below the maximal rank. 

In contrast, the recovery rates for SOMP and RA-OMP (recall this is not fully rank aware) do not reach 100% 
ever. Interestingly these plots suggest that the rank information is still playing some role in the recovery performance 
of SOMP as the dominant increase in its performance occurs during the range of increasing rank. When I > 16 
and the rank remains fixed the performance appears to plateau at around 80% recovery. Additional simulations (not 
shown) indicate that further increasing the number of measurement vectors does not enable SOMP to pass this 
bound. Curiously, although the performance of RA-OMP increases much more rapidly than that of SOMP at first, 
it appears to stall at the same recovery rate as that for SOMP, suggesting that the rank degeneration identified in 
Section VI-B introduces a bottleneck to the recovery performance for RA-OMP. 




Number of measurement vectors, I 

Fig. 2. The empirical probability of recovery for SOMP, RA-OMP. RA-ORMP and RA-thresholding as a function of number of measurement 
vectors for a fixed sparsity level, k = m/2 = 16. 

VIII. Discussion 

In this paper we considered the role of the rank of the coefficient matrix in the sparse recovery problem with 
multiple measurement vectors (MMV). We began by reviewing known results for the sufficient conditions for 
identifiability of the MMV problem and showed that these conditions are also necessary. We then demonstrated 
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that algorithmically the rank information can be used to reduce the complexity of the general combinatorial search 
procedure. 

Next we turned our attention to practical algorithms. The MMV problem has already received considerable study. 
However we have shown that the most popular classes of algorithm, g-SOMP and mixed (} /^'^ minimization, are 
both rank bhnd. Similar arguments apply to related algorithms (e.g. mixing ^^/f * minimization) but we have omitted 
the details. Indeed, to our knowledge, none of the existing popular algorithms being used for joint sparse recovery 
are rank aware. This seemed to be a serious shortcoming of such techniques. 

We therefore developed some rank aware greedy algorithms and derived certain worst case recovery conditions 
for these algorithms. One might (rightly) criticise such analysis as overly pessimistic however we stress that, in the 
full rank case, the worst case performance of RA-thresholding and RA-ORMP still significantly outperforms the 
average case performance for the most popular (rank blind) algorithms [13]. Such are the benefits of exploiting the 
rank information. 

In this paper we have focused on greedy rank aware algorithms. This is because we were unable to formulate a 
convex rank aware algorithm. Indeed an interesting open question is: does a convex rank aware recovery algorithm 
exist that can interpolate between ii minimization when I = 1 and guaranteed recovery when rank(Y) = k? 
Similarly we could ask whether other popular sparse recovery algorithms such as CoSaMP [8] or Iterative Hard 
Thresholding [46], [47] can be adapted to create new rank aware recovery algorithms for the MMV problem. 

In future work we plan to analyze the recovery properties of the proposed algorithms in more detail for the region 
1 < rank(X) < k. It is also important to quantify the typical performance of the algorithms rather than the worst 
case scenario and to account for the effects of noise. We expect an analysis similar to that in [27], [13] would be 
appropriate here. 
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